Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix negative expires_ms and avoid worker freezing while using cron #479

Merged
merged 4 commits into from
Jan 6, 2025

Conversation

Matvey-Kuk
Copy link
Contributor

@Matvey-Kuk Matvey-Kuk commented Sep 11, 2024

We're encouraging issue with negative expires_ms while excessively using frequent cron jobs. It's happening rarely, but is causing worker freezes which is pretty annoying in production.

The issue is also mentioned here: #447

To reproduce:

  1. Schedule a cron task.
  2. Put a debugger breakpoint to https://github.com/python-arq/arq/blob/main/arq/worker.py#L752 (emulating some delay here: https://github.com/python-arq/arq/blob/main/arq/worker.py#L724)
  3. It will cause _defer_until to be in the past.
  4. It will cause expires_ms to be negative here: https://github.com/python-arq/arq/blob/main/arq/connections.py#L170
  5. Raises
Task exception was never retrieved
future: <Task finished name='Task-5' coro=<Worker.async_run() done, defined at /Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py:315> exception=ResponseError('Command # 1 (PSETEX arq:job:cron:process_background_ai_task:1726036281123 -319 �\x04�H\x00\x00\x00\x00\x00\x00\x00}�(�\x01t�N�\x01f��\x1fcron:process_background_ai_task��\x01a�)�\x01k�}��\x02et��\x06r��ߑ\x01u.) of pipeline caused error: ("invalid expire time in \'psetex\' command",)')>
Traceback (most recent call last):
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 320, in async_run
    await self.main_task
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 361, in main
    await self._poll_iteration()
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 401, in _poll_iteration
    await self.heart_beat()
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 727, in heart_beat
    await self.run_cron(now, cron_window_size)
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/worker.py", line 759, in run_cron
    job_futures and await asyncio.gather(*job_futures)
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/arq/connections.py", line 177, in enqueue_job
    await pipe.execute()
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/client.py", line 1399, in execute
    return await conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
    return await do()
           ^^^^^^^^^^
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/client.py", line 1287, in _execute_transaction
    self.raise_first_error(commands, response)
  File "/Users/matvey/Desktop/keep-oss/.venv/lib/python3.11/site-packages/redis/asyncio/client.py", line 1326, in raise_first_error
    raise r
redis.exceptions.ResponseError: Command # 1 (PSETEX arq:job:cron:process_background_ai_task:1726036281123 -319 ��H}�(�t�N�f��cron:process_background_ai_task��a�)�k�}��et��r��ߑu.) of pipeline caused error: ("invalid expire time in 'psetex' command",)
  1. Worker will stay in the "frozen" state not executing tasks.

Thank you for the library! ❤️

@Matvey-Kuk Matvey-Kuk marked this pull request as draft September 11, 2024 06:44
@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #479      +/-   ##
==========================================
- Coverage   96.27%   96.01%   -0.27%     
==========================================
  Files          11       11              
  Lines        1074     1078       +4     
  Branches      209      190      -19     
==========================================
+ Hits         1034     1035       +1     
- Misses         19       21       +2     
- Partials       21       22       +1     
Files with missing lines Coverage Δ
arq/worker.py 97.17% <ø> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f752e2...c7ab742. Read the comment docs.

@Matvey-Kuk Matvey-Kuk marked this pull request as ready for review September 11, 2024 07:05
@Matvey-Kuk
Copy link
Contributor Author

@Tinche @JonasKs I found you authored nearby code, wonder if you could review this, thank you :)

@Matvey-Kuk
Copy link
Contributor Author

@samuelcolvin ? :)

@Matvey-Kuk
Copy link
Contributor Author

@chrisguidry 👋

@chrisguidry
Copy link
Contributor

Thanks @Matvey-Kuk! I'll work on getting a release ready

@chrisguidry chrisguidry merged commit 3bacb07 into python-arq:main Jan 6, 2025
11 checks passed
@Matvey-Kuk
Copy link
Contributor Author

Thank you @chrisguidry ! Moving Arq back to prod <3

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants